{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import warnings\n",
    "warnings.filterwarnings(\"ignore\")\n",
    "import spacy\n",
    "nlp = spacy.load('en_coref_sm')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "为了简洁地展现共指消解的应用，这里将把问答系统部分的难度降到最低，直接有了问题-答案的字典映射。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "QA = {\"Who is Abraham Lincoln?\":\"An American statesman and lawyer who served as the 16th President of the United States.\",\n",
    "      \"When was Abraham Lincoln born?\":\"February 12, 1809.\",\n",
    "      \"Where is Abraham Lincoln's hometown?\":\"Hodgenville, Kentucky\"}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;这些问题没有办法应付代词，然而人在有上下文的对话中使用代词是再自然不过的事了。用共指消解就可以解决这个问题。我们会把每一次的问答记录都记录在上下文中，这样我们就可以用共指消解把之前提到的对象再搬到后面的代词里来，使得有代词的问题也可以与原始模板匹配。\n",
    "\n",
    "&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;让我们先实验一下这个想法是否可行。"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[Abraham Lincoln: [Abraham Lincoln, he, his]]\n",
      "Who is Abraham Lincoln? When was Abraham Lincoln born? Where is Abraham Lincoln hometown?\n"
     ]
    }
   ],
   "source": [
    "para = \"Who is Abraham Lincoln? When was he born? Where is his hometown?\"\n",
    "doc = nlp(para)\n",
    "print(doc._.coref_clusters)\n",
    "print(doc._.coref_resolved)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;效果不错，he和his都识别了出来。然而这边还有一个问题，就是物主代词his再被翻译回来时没有按照语法规则恢复's。另外，我们也要能够把修改好单独问句再从上下文中“抽出来”。所以我们要自己写一个函数，用到mention.start_char这些属性来手动完成替换和考虑些特殊情况。\n",
    "\n",
    "&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;我们的最终目标是，实现一个直观的answer(question)函数，直接根据当前的问题给出答案，实现如下："
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "context = \"\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
    "def my_coref(orig_text,to_replace):\n",
    "    left = 0\n",
    "    processed_text = \"\"\n",
    "    for beg,end,mention in to_replace:\n",
    "        processed_text += orig_text[left:beg] + mention\n",
    "        left = end\n",
    "    processed_text += orig_text[left:]\n",
    "    return processed_text"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "def answer(question):\n",
    "    global context\n",
    "    start_pos = len(context)\n",
    "    context += (question + \" \")\n",
    "    print(\"context:\",context)\n",
    "    if question in QA: \n",
    "        return QA[question]\n",
    "    else:\n",
    "        doc = nlp(context)\n",
    "        if doc._.has_coref:\n",
    "            print(doc._.coref_clusters)\n",
    "            to_replace = []\n",
    "            for clust in doc._.coref_clusters:\n",
    "                main_mention = clust.main\n",
    "                for mention in clust.mentions:\n",
    "                    beg, end = mention.start_char - start_pos, mention.end_char - start_pos\n",
    "                    if end > 0:                                     # 是本句中的指代\n",
    "                        if mention.text in [\"its\",\"his\",\"her\",\"my\",\"your\",\"our\",\"their\"]:\n",
    "                            to_replace.append((beg,end,main_mention.text+\"'s\"))\n",
    "                        else:\n",
    "                            to_replace.append((beg,end,main_mention.text))\n",
    "            to_replace = sorted(to_replace)                         # 按照起始位置升序排序，为逐个替换做准备\n",
    "            question2 = my_coref(question,to_replace)\n",
    "            print(\"new question:\",question2)\n",
    "            if question2 in QA:\n",
    "                return QA[question2]\n",
    "                    \n",
    "    return \"I don't know.\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "context: Who is Abraham Lincoln? \n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'An American statesman and lawyer who served as the 16th President of the United States.'"
      ]
     },
     "execution_count": 7,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "answer(\"Who is Abraham Lincoln?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "context: Who is Abraham Lincoln? When was he born? \n",
      "[Abraham Lincoln: [Abraham Lincoln, he]]\n",
      "new question: When was Abraham Lincoln born?\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'February 12, 1809.'"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "answer(\"When was he born?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "context: Who is Abraham Lincoln? When was he born? Where is his hometown? \n",
      "[Abraham Lincoln: [Abraham Lincoln, he, his]]\n",
      "new question: Where is Abraham Lincoln's hometown?\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "'Hodgenville, Kentucky'"
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "answer(\"Where is his hometown?\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python [conda env:py36]",
   "language": "python",
   "name": "conda-env-py36-py"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.7"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 2
}
